Phonetic Inventory for an Arabic Speech Corpus

نویسندگان

  • Nawar Halabi
  • Mike Wald
چکیده

Corpus design for speech synthesis is a well-researched topic in languages such as English compared to Modern Standard Arabic, and there is a tendency to focus on methods to automatically generate the orthographic transcript to be recorded (usually greedy methods). In this work, a study of Modern Standard Arabic (MSA) phonetics and phonology is conducted in order to create criteria for a greedy meth-od to create a speech corpus transcript for recording. The size of the dataset is reduced a number of times using these optimisation methods with different parameters to yield a much smaller dataset with identical phonetic coverage than before the reduction, and this output transcript is chosen for recording. This is part of a larger work to create a completely annotated and segmented speech corpus for MSA.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Phonetic tool for the Tunisian Arabic

A phonetic dictionary is an essential component of a speech recognition system or a speech synthesis system. Our work targets the generation of an automatic pronunciation dictionary for the Tunisian Arabic, in particular in the field of rail transport. To do this, we created two tools of phonetic vowelized and unvowelized words in the Tunisian Arabic. The proposed method to automatically genera...

متن کامل

A Corpus and Phonetic Dictionary for Tunisian Arabic Speech Recognition

In this paper we describe an effort to create a corpus and phonetic dictionary for Tunisian Arabic Automatic Speech Recognition (ASR). The corpus, named TARIC (Tunisian Arabic Railway Interaction Corpus) has a collection of audio recordings and transcriptions from dialogues in the Tunisian Railway Transport Network. The phonetic (or pronunciation) dictionary is an important ASR component that s...

متن کامل

Prosodic Alternative Units in a Mandarin Chinese Speech Synthesizer

The Mandarin Chinese synthesis component of the Dresden Speech Synthesizer DreSS is based on an inventory of syllabic units. The inventory contains all Chinese syllables with the possible tones in up to three phonetic variations for a correct modeling of the cross syllable coarticulation effects. In order to improve the naturalness and fluency of the synthesized speech, the inventory was comple...

متن کامل

Slovak Unit-Selection Speech Synthesis: Creating a New Slovak Voice within a Czech TTS System ARTIC

ARTIC (Artificial Talker in Czech) is a corpusbased text-to-speech (TTS) system that enables to synthesise an arbitrary text, mainly for the Czech language. Basically, two versions of ARTIC are available—a single unit instance system (also known as fixed-inventory synthesis) with the quality of resulting speech limited by the fixed inventory, and multiple unit instance system with the quality p...

متن کامل

Arabic Phonetic Dictionaries for Speech Recognition

Phonetic dictionaries are essential components of large-vocabulary speaker-independent speech recognition systems. This paper presents a rule-based technique to generate phonetic dictionaries for a large vocabulary Arabic speech recognition system. The system used conventional Arabic pronunciation rules, common pronunciation rules of Modern Standard Arabic, as well as some common dialectal case...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016